Semi-naive Bayesian Classification
نویسندگان
چکیده
The success and popularity of naive Bayes has led to a field of research exploring algorithms that seek to retain its numerous strengths while reducing error by alleviating the attribute interdependence problem. This thesis builds upon this promising field of research, contributing a systematic survey and several novel and effective techniques. It starts with a study of the strengths and weaknesses of previous semi-naive Bayesian methods, providing a taxonomy of them and comparative analysis of their features. Twelve key semi-naive Bayesian methods are benchmarked using error analysis based on the bias-variance decomposition, probabilistic prediction analysis based on the quadratic loss function and training and classification time analysis on sixty natural domains from the UCI Machine Learning Repository. Results for logistic regression and LibSVM, a popular SVM implementation, are also presented to provide a baseline for comparison. In analyzing results of these experiments, we offer general recommendations for selection between semi-naive Bayesian methods based on the characteristics of the application to which they are applied. This comparative study supports previous findings of strong performance from Averaged One-Dependence Estimators (AODE), which significantly reduces naive Bayes’ error with modest training and classification time overheads. Backward Sequential Elimination is an effective wrapper technique to identify and repair harmful interdependencies, and has been profitably applied to naive Bayes. It is therefore surprising that its straightforward application to AODE has previously proved ineffective. In response to this observation, this thesis explores novel variants of this strategy leading to effective techniques. These eliminate child attributes from within the constituent One-Dependence Estimators, thereby significantly improving AODE’s prediction accuracy. However, due to repeated accuracy evaluation of attribute subsets on AODE, these elimination techniques have very high training time complexity. In response to this drawback, a new type of semi-naive Bayesian operation, Subsumption Resolution (SR), is proposed. It efficiently identifies pairs of attribute values such that one is a generalization of the other and deletes the generalization at classification time. This adjustment is proved to be theoretically correct for such an interdependence relationship. The thesis demonstrates experimentally that SR can in practice significantly
منابع مشابه
A Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملA Bayesian mixture model for classification of certain and uncertain data
There are different types of classification methods for classifying the certain data. All the time the value of the variables is not certain and they may belong to the interval that is called uncertain data. In recent years, by assuming the distribution of the uncertain data is normal, there are several estimation for the mean and variance of this distribution. In this paper, we co...
متن کاملDomains of competence of the semi-naive Bayesian network classifiers
The motivation for this paper comes from observing the recent tendency to assert that rather than a unique and globally superior classifier, there exist local winners. Hence, the proposal of new classifiers can be seen as an attempt to cover new areas of the complexity space of datasets, or even to compete with those previously assigned to others. Several complexity measures for supervised clas...
متن کاملFinite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier
The Semi-Naive Bayesian network (SNB) classifier, a probabilistic model with an assumption of conditional independence among the combined attributes, shows a good performance in classification tasks. However, the traditional SNBs can only combine two attributes into a combined attribute. This inflexibility together with its strong independency assumption may generate inaccurate distributions fo...
متن کاملA Comparative Study of Semi-naive Bayes Methods in Classification Learning
Numerous techniques have sought to improve the accuracy of Naive Bayes (NB) by alleviating the attribute interdependence problem. This paper summarizes these semi-naive Bayesian methods into two groups: those that apply conventional NB with a new attribute set, and those that alter NB by allowing inter-dependencies between attributes. We review eight typical semi-naive Bayesian learning algorit...
متن کامل